SCISSOR- Auto-Extraction Tool to Boost Document Scanning
نویسندگان
چکیده
منابع مشابه
Single-Document Keyphrase Extraction for Multi-Document Keyphrase Extraction
Here, we address the task of assigning relevant terms to thematically and semantically related sub-corpora and achieve superior results compared to the baseline performance. Our results suggest that more reliable sets of keyphrases can be assigned to the semantically and thematically related subsets of some corpora if the automatically determined sets of keyphrases for the individual documents ...
متن کاملUsing Information Extraction to Improve Document Retrieval
We describe an approach to applying a particular kind of Natural Language Processing NLP system to the TREC routing task in Information Retrieval IR Rather than attempting to use NLP techniques in indexing documents in a corpus we adapted an information extraction IE system to act as a post lter on the output of an IR system The IE system was con gured to score each of the top documents as dete...
متن کاملScaling Information Extraction to Large Document Collections
Information extraction and text mining applications are just beginning to tap the immense amounts of valuable textual information available online. In order to extract information from millions, and in some cases, billions of documents, different solutions to scalability emerged. We review key approaches for scaling up information extraction, including using general-purpose search engines as we...
متن کاملFeature extraction using auto-associative neural networks
Modal analysis is now mature and well accepted in the design of mechanical structures. It determines the vibration mode shapes and the corresponding natural frequencies. However, the validity of modal analysis is limited to structures showing a linear behaviour. In non-linear structural dynamics, it is well known that mode shapes are no longer useful for the characterization of the dynamic resp...
متن کاملVersatile document image content extraction
We offer a preliminary report on a research program to investigate versatile algorithms for document image content extraction, that is locating regions containing handwriting, machine-print text, graphics, line-art, logos, photographs, noise, etc. To solve this problem in its full generality requires coping with a vast diversity of document and image types. Automatically trainable methods are h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Applications
سال: 2015
ISSN: 0975-8887
DOI: 10.5120/19965-1809